An Adversarial Analysis of the Reidentifiability of the Heritage Health Prize Dataset
نویسنده
چکیده
I analyze the reidentifiability of the Heritage Health Prize dataset taking into account the auxiliary information available online and offline to a present-day adversary. A key technique is identifying providers, which is useful both as an end in itself and as a stepping stone towards identifying members. My primary findings are: 1. Grouping providers based on shared members results in the formation of clusters which likely correspond to hospitals; 2. There is enough auxiliary information to identify most of these hospitals, and possibly also individual providers; 3. An adversary who has detailed information about a member’s health conditions will be able to uniquely identify him or her; 4. While there are numerous websites where users can share reviews, health conditions, etc., their adoption is not currently high enough to serve as a source of auxiliary information for a large-scale member-reidentification attack. I provide bounds on the efficacy of the methods I describe, but time constraints prevented me from attempting a more complete attack. To the best of my judgment, reidentification is within the realm of possibility; however, it is far from straightforward and will require algorithmic sophistication as well as sleuthing for auxiliary data. While identification of providers might be useful to contestants for improving predictive performance, large-scale reidentification of members—that has the potential to pose a threat to privacy and to the fidelity of the contest—appears unlikely to be feasible due to the paucity of auxiliary information. ∗e-mail: [email protected]; web: http://randomwalker.info/
منابع مشابه
De-identification Methods for Open Health Data: The Case of the Heritage Health Prize Claims Dataset
BACKGROUND There are many benefits to open datasets. However, privacy concerns have hampered the widespread creation of open health data. There is a dearth of documented methods and case studies for the creation of public-use health data. We describe a new methodology for creating a longitudinal public health dataset in the context of the Heritage Health Prize (HHP). The HHP is a global data mi...
متن کاملTurn-Taking, Preference, and Face in Criticism Responses
Vivas have multiple functions in academia, but their main goal is completing thesis evaluation. At the heart of this evaluation is a series of criticisms and their responsive turns by which participants talk vivas as institution into being (Heritage, 1997). Turn-taking is one of the many ways vivas are talked into being. This study drew upon conversation analysis to look into the turn allocatio...
متن کاملRobust Opponent Modeling in Real-Time Strategy Games using Bayesian Networks
Opponent modeling is a key challenge in Real-Time Strategy (RTS) games as the environment is adversarial in these games, and the player cannot predict the future actions of her opponent. Additionally, the environment is partially observable due to the fog of war. In this paper, we propose an opponent model which is robust to the observation noise existing due to the fog of war. In order to cope...
متن کاملExploration and analyzing the value of Industrial architectural heritage conservation, Case study: Cement Factory of Shahr-e-Rey
Abstract: As a cultural heritage, industrial buildings are prone to various changes and transformations to the extent that they are practically susceptible to complete destruction. Accordingly, such spaces which should be considered lively places have unfortunately turned into the symbols of urban disorder. also, an industrial heritage covers social, economic, and cultural values. Therefore, de...
متن کاملAutomatic Colorization of Grayscale Images Using Generative Adversarial Networks
Automatic colorization of gray scale images poses a unique challenge in Information Retrieval. The goal of this field is to colorize images which have lost some color channels (such as the RGB channels or the AB channels in the LAB color space) while only having the brightness channel available, which is usually the case in a vast array of old photos and portraits. Having the ability to coloriz...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014